ABST2[X,ALS] - www.SailDart.org

perm filename ABST2[X,ALS] blob sn#075317 filedate 1973-12-04 generic text, type T, neo UTF8
00100			The Amanuensis Speech Recognition System
00200	
00300					by
00400	
00500				James L.Hieronymus
00600				  Neil J. Miller
00700				 Arthur L. Samuel
00750	
00760		    Stanford A.I. Laboratory, Stanford University
00780	
00782				     Abstract
00785	
00800	The Amanuensis speech recognition system  under  development  at  the
00900	Stanford  A.I.  Laboratory  is a signature-table oriented system that
01000	uses machine learning techniques and attempts to  extract  a  maximum
01100	amount  of linguistic information from the acoustic speech signal. It
01200	differs from the system previously reported in a number of  important
01300	respects:
01400		1) A new acoustic  segmenter  is  used  to  extract  prosodic
01500	features  from the acoustic input and to isolate regions for especial
01600	treatment.
01700		2)  Parameters  for  all  voiced regions are determined pitch
01800	synchronously using a new glottal pulse locator.
01900		3)  Use  is  made of information from both the steady or near
02000	steady state regions and from the transition regions.
02100		3)  Speaker  normalization  is  done,  partly  by formula and
02200	partly by signature tables.
02300		4) Greater use is made of the redundancy of speech to improve
02400	the recognition.
02500		5)  Improvements  have been made in the design and use of the
02600	signature tables both to improve their  accuracy  and  to  achieve  a
02700	better  compromise between the need for excessive amounts of training
02800	material and the need for smoothing.
02900		6)  A  bootstrapping  technique  is  under study which should
03000	greatly reduce the amount of hand segmentation necessary  to  provide
03100	the anotated training material.
03200		7) Several possible output streams of phonemes  are  produced
03300	with  probability  ratings  for both the complete streams and for the
03400	individual phonemes, so that it should not be necessary  ever  to  go
03500	back  to  the original acoustic input data to resolve ambiguities and
03600	to incorporate syntactic, semantic and contextual information in  the
03700	decision process.